Fix JSON serialization in API request to avoid Unicode escaping for non-ASCII characters#297
Fix JSON serialization in API request to avoid Unicode escaping for non-ASCII characters#297gyl6666 wants to merge 3 commits into
Conversation
…on-ASCII characters When requesting a large language model, the openai_api_compatible plugin serializes and sends the request payload. When the content contains Chinese text, the request body is fully encoded as Unicode escape sequences (e.g., \uXXXX), which significantly increases token usage and may negatively impact the model's understanding.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where non-ASCII characters in API requests to large language models were being unnecessarily escaped, leading to increased token usage and potential comprehension issues. By adjusting the JSON serialization settings, the system now preserves original character encoding, optimizing request payloads and enhancing model performance for multilingual content. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request aims to fix an issue with JSON serialization where non-ASCII characters were escaped, increasing token usage. The approach of using ensure_ascii=False is correct, but its implementation is flawed. Using the json parameter of requests.post with an already serialized string will lead to double-encoding and a malformed request body. I've provided a critical-severity comment with a suggested fix to use the data parameter instead, which will correctly send the pre-serialized JSON.
| endpoint_url, | ||
| headers=headers, | ||
| json=data, | ||
| json=json.dumps(data, ensure_ascii=False), |
There was a problem hiding this comment.
Using the json parameter of requests.post with an already-serialized JSON string is incorrect. The requests library will serialize the provided string again, resulting in a double-encoded request body. For example, {'key': 'value'} would become '"{\"key\": \"value\"}"' in the request body, which is a JSON string containing another string, not the intended JSON object {"key": "value"}. This will likely cause the server to fail parsing the request.
To send a pre-serialized JSON string, you should use the data parameter and provide it with a bytes-encoded string. The Content-Type header is already correctly set to application/json earlier in the code, so this change will ensure the request is sent correctly.
| json=json.dumps(data, ensure_ascii=False), | |
| data=json.dumps(data, ensure_ascii=False).encode('utf-8'), |
To send a pre-serialized JSON string, we should use the data parameter and provide it with a bytes-encoded string Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
gyl6666
left a comment
There was a problem hiding this comment.
To send a pre-serialized JSON string, we should use the data parameter and provide it with a bytes-encoded string
send data with byte coded string.
gyl6666
left a comment
There was a problem hiding this comment.
send data with bytecoded string
Problem
When requesting a large language model, the
openai_api_compatibleplugin serializes and sends the request payload. When the content contains Chinese text, the request body is fully encoded as Unicode escape sequences (e.g.,\uXXXX), which significantly increases token usage and may negatively impact the model's understanding.Solution
ensure_ascii=False.[dify_plugin/interfaces/model/openai_compatible/llm.py]Testing
Pull Request Checklist
Compatibility Check
README.mdREADME.mdREADME.mdREADME.md(如果没更新版本号请保持 unchecked)Available Checks